A reference genome of Commelinales provides insights into the commelinids evolution and global spread of water hyacinth (Pontederia crassipes)

Abstract Commelinales belongs to the commelinids clade, which also comprises Poales that includes the most important monocot species, such as rice, wheat, and maize. No reference genome of Commelinales is currently available. Water hyacinth (Pontederia crassipes or Eichhornia crassipes), a member of Commelinales, is one of the devastating aquatic weeds, although it is also grown as an ornamental and medical plant. Here, we present a chromosome-scale reference genome of the tetraploid water hyacinth with a total length of 1.22 Gb (over 95% of the estimated size) across 8 pseudochromosome pairs. With the representative genomes, we reconstructed a phylogeny of the commelinids, which supported Zingiberales and Commelinales being sister lineages of Arecales and shed lights on the controversial relationship of the orders. We also reconstructed ancestral karyotypes of the commelinids clade and confirmed the ancient commelinids genome having 8 chromosomes but not 5 as previously reported. Gene family analysis revealed contraction of disease-resistance genes during polyploidization of water hyacinth, likely a result of fitness requirement for its role as a weed. Genetic diversity analysis using 9 water hyacinth lines from 3 continents (South America, Asia, and Europe) revealed very closely related nuclear genomes and almost identical chloroplast genomes of the materials, as well as provided clues about the global dispersal of water hyacinth. The genomic resources of P. crassipes reported here contribute a crucial missing link of the commelinids species and offer novel insights into their phylogeny.


Introduction
Pontederia crassipes (NCBI: txid44947, former name Eichhornia crassipes ), commonly known as water hyacinth, belongs to Pontederiaceae of the Commelinales and is a perennial floating plant with light blue or purple flo w ers.P. crassipes is an allopolyploid with 32 c hr omosomes (2n = 4x = 32) [ 1 ].Water hyacinth originated from the Amazon Basin, South America, and has spread to the tropics and subtropics since the 1800s to have a pan-tropical distribution across the world [ 2 ].It is recognized as an exceedingl y a ggr essiv e aquatic plant species that exhibits r a pid gr owth and possesses the capacity for both sexual and asexual r epr oduction [ 3 ].Although restricted to freshwater en vironments , it can effectively utilize nutrients, so it flourishes particularly in ecosystems with high nutrient loading, consequently outcompeting native plant species for space and sunlight [3][4][5][6].As a result, it has been recognized by the International Union for Conservation of Nature as one of the 100 most inv asiv e species and has been listed among the 10 most serious weed plants in the world [ 7 , 8 ].
Commelinales is a br anc h of the commelinids clade, which also comprises P oales , Zingiberales , and Arecales .Many members of commelinids, such as rice, wheat, and maize, provide calorieric h gr ains, liv estoc k feed, and industrial r aw materials [9][10][11].A phylogenic tree of the commelinids has been constructed based on plastid genomes [ 12 ].Ho w e v er, the contr ov ersy surr ounding the phylogeny of commelinids, especially the placement of Poales and Commelinales, persists [12][13][14].This uncertainty is attributed to discordance between nuclear and organellar phylogenies , which ma y arise fr om hybridization, incomplete linea ge sort- Brazil (5), China (1), Malaysia (1), Germany (2) ing, gene duplication, and gene loss [15][16][17].Nuclear-plastid conflicts ar e pr e v alent at differ ent taxonomic le v els of angiosperms, such as the placement of the Celastrales , Malpighiales , and Oxalidales clade and the commelinids [ 13 , 18 , 19 ].Many genomes of the economically important members of the commelinids have been sequenced, for example, the grasses (Poaceae), gingers and bananas (Zingiberales), and palms (Arecaceae).Ho w ever , no reference genome within the Commelinales order has been generated up to now, which has hindered the elucidation of the phylogenetic puzzle of commelinids.Her e, we gener ated a c hr omosome-scale genome assembl y of P. crassipes , investigated genome evolution of P. crassipes in relation to its related species to determine the phylogeny and ancestral karyotype of the commelinids, and further explored the genetic diversity and phylogenetic relationships of w ater hy acinth using materials collected from several countries.

Genome assembly, phasing, and annotation
We sequenced an P. crassipes individual (Zijingang#1) collected from Hangzhou, China.The estimated genome size of P. crassipes was ∼1,058 Mb based on a k -mer survey using Illumina short reads ( Supplementary Fig. S1A ) and ∼1,278 Mb based on flo w c ytometry, consistent with its C-value of 1.28 pg/1C [ 20 ] ( Supplementary Fig. S1B-D ).The heter ozygosity le v el of the P. crassipes genome was estimated to be 0.76%, and re petiti ve content accounts for 68.85% of the genome ( Supplementary Fig. S1A ).Based on 68-Gb (52 ×) HiFi reads with an av er a ge r ead length of 17.72 kb, a de novo assembly yielded a genome of 1.30 Gb, including 1,699 contigs with a contig N50 size of 39.5 Mb ( Supplementary Table S1 ).
With the 130 Gb Hi-C data generated by this study, we assembled the genome of P. crassipes by anchoring 606 contigs to 16 superscaffolds (pseudoc hr omosomes) with a total length of 1.22 Gb, r epr esenting 95.3% of the estimated size (Table 1 ).Attributed to the nature of tetraploid, the high collinearity between the 2 subgenomes brings challenges to assembly, which se v er el y r educed the r eliability of the r egular ordering methods.Given that allopolyploids contain subgenome-specific sequences, w e sear ched a subgenome-specific sequence ( k -mer) and then clustered the specific sequences that differentiate homoeologous c hr omosomes, whic h enabled consistent partitioning of the genome into 2 subgenomes ( Supplementary Fig. S2 ).Consequentl y, 16 superscaffolds wer e assigned to the 2 subgenomes, termed subA and subB.After phasing and ordering with directional interactions, we finally assembled the genome of P. crassipes with the size of subA and subB being 640.2 Mb and 577.6 Mb, respectiv el y, and the size of pseudoc hr omosomes r anged fr om 45.81 to 104.49Mb (Table 1 ).
The quality of the assembly was validated through mapping 98.51% of the genomic short reads obtained by Illumina sequencing to the assembly.The long terminal r epeat assembl y index score was 11.78, indicating a reference quality comparable with those of Arabidopsis (TAIR10) and Vitis vinifera [ 21 , 22 ].We also estimated base-le v el accur acy and completeness of the genome and ac hie v ed a high assembly consensus quality value (42.0).High genomic synteny was observed between the P. crassipes assembly and other genomes within the commelinids clade (e.g., Cocos nucifera ) (further details are available in the following sections).Taken together, the results suggested the reliability of the P. crassipes assembly.
A total of 65,299 genes were predicted in the P. crassipes assembl y by a ppl ying a combination of homology, tr anscript-based and ab initio gene predictions approaches, after filtering out 732.02 Mb (56.09%) of re petiti ve sequences.Subsequently, we identified 33,608 and 31,691 genes in subgenomes A and B, respectiv el y.B USCO was used to assess the completeness of our genome annotation, whic h r e v ealed that the gene set we annotated encompassed 1,536 (95.2%) of the 1,614 universal single-copy genes present in the Embryophyta lineage [ 23 ] ( Supplementary Table S2 ).

Phylogenetic position of P. crassipes based on single-copy genes
To r esolv e the phylogenetic position of Commelinales in the commelinids, we first constructed a phylogenetic tree using a concatenated sequence of 180 single-copy orthologs identified by Or-thoFinder [ 24 ] of the water hyacinth genome and 7 other r epr esentative members with high-quality genomes, using Acorus tatarinowii as the outgroup ( Supplementary Fig. S3A ).The phylogenetic tr ee r e v ealed that Zingiber ales and Commelinales were sister lin-  nucifera .An example of conserved synteny region originating from the τ WGD event is marked with a rectangle.eages of Arecales, and Poales is located in the out node, which supports the pr e vious phylogenetic studies by Cheng et al. [ 25 ] and Wang et al. [ 26 ].
To validate the stability of the phylogenetic tree, we reconstructed a maximum likelihood phylogeny by utilizing a concatenated matrix comprising 180 single-copy orthologs from the 9 genomes.A coalescent-based phylogeny was also generated thr ough integr ation of the single-copy gene trees ( Supplementary Fig. S3B , C ).The topologies of both the coalescent and concatenate trees supported the aforementioned ortholog-based tree.Strong r obustness was e vident at eac h node ( Supplementary Fig. S3B ) within the coalescent tree.At the same time, the outcomes were also in accordance with a consensus tr ee gener ated using Den-siTree [ 27 ] ( Supplementary Fig. S3D ).

W hole-genome duplica tions of P. crassipes
Whole-genome duplications (WGDs) cause r a pid genome r eor ganization and structural variations to produce new chromosomal karyotypes [ 28 , 29 ].The analysis of genomic synteny sho w ed excellent collinearity within the P. crassipes genome, which suggested recent genomic duplication events ( Supplementary Fig. S4 ).Based on the syntenic blocks, we clustered the pseudochromosomes into ancestral chromosomes as A1 (Chr1A-4A), A2 (Chr5A-8A), B1 (Chr1B-4B), and B2 (Chr5B-8B).To confirm potential WGD e v ents in the water hyacinth genome and estimate div er gence time, we extracted syntenic gene pairs within the P. crassipes genome and their orthologs in 4 r epr esentativ e species of the commelinids ( A. tatarinowii , Musa balbisiana , C. nucifera , and Pharus latifolius ).The distribution of synonymous substitutions per site ( K S ) indicated that at least 3 rounds of WGDs happened during P. crassipes evolution, consistent with the abov e synten y anal ysis r esults (Fig. 1 B).Ho w e v er, the estimated div er gence of w ater hy acinth and palms of Arecaceae ( K S = 1.04) occurred after divergence from Zingiberales ( K S = 1.17) according to the K S peaks, which conflicts with the phylogenetic tree (Fig. 1 A).The stronger collinearity between w ater hy acinth and palms seemed to support the result of K s distribution (Fig. 1 C).
The conflict between the K S inference and phylogenetic analysis might be triggered by se v er al factors, suc h as differ ent substitution rates or structural genomic rearrangement rates [ 30 , 31 ].To test this hypothesis, we inferred the substitution rate in each br anc h with Bayesian methods implemented in BEAST2 [ 32 ].Concordant with the hypothesis, the estimated substitution rate in the palm (0.67) was significantly less than that in the ginger (1.18), indicating that the evolutionary rate variation across the taxa caused the bias of the K S distribution.
We further extracted paralogs present in the genomes derived from the WGDs, aiming to elucidate the orders and dates of the WGD e v ents that tr anspir ed during the e v olution of w ater hyacinth.Two prominent peaks of the K S distribution of water hyacinth (Fig. 1 B) suggested 2 r elativ el y r ecent WGD e v ents .T hese e v ents encompassed the most recent tetraploidization event and a duplication e v ent specific to the Commelinales lineage.Water hyacinth shared an ancient WGD with other commelinids, which has been recognized as the τ WGD event [ 26 ] ( Supplementary Fig. S5 ).To confirm the ancient duplication process, we estimated the copy number in collinear regions between the water hyacinth and coconut genomes and found that some genomic regions indeed shared 4 corresponding copies in the 2 genomes (Fig. 1 C).A case with the detailed genomic synteny between the 2 genomes (water hyacinth Chr3, Chr4, Chr6, Chr7 vs. coconuts Chr4, Chr12, Chr16) is shown in Fig. 1 C. Following the estimated time of τ WGD (129-146 mya) based on the coconut genome [ 26 ], the tetr a ploidization e v ent of w ater hy acinth w as estimated to occur a ppr oximatel y 8-10 mya and the linea ge-specific duplication at 67-76 mya, which all were comparable to the phylogenetic estimates (Fig. 1 A).Differentiated transposable element (TE) contents wer e observ ed in 2 subgenomes of w ater hy acinth, with a divergence r ate r anging fr om 2% to 8% (subA) and 16% to 22% (subB).These differ ences r esulted in the formation of a distinctive "bubble" peak within the TE profile, indicating a WGD pattern similar to that observed in the analysis of collinear paralogous pairs ( Supplementary Fig. S6 ).

Mass loss of disease-resistance genes in the P. crassipes genome
To estimate gene loss and gain during polyploidization, gene famil y sizes wer e determined by identifying pr otein domains in P. crassipes and other r epr esentativ e genomes.We first compared gene family sizes between tetraploid P. crassipes and the diploid Oryza sativa genome using a dot matrix plot (Fig. 2 A).The results sho w ed that the size of the majority of gene families in P. crassipes was almost 2 times higher than those in O. sativa , consistent with their ploidy.The analysis also revealed that the size of se v er al gene families (pr edominantl y associated with disease resistance) in P. crassipes was significantly smaller than expected, for example, genes encoding NB-ARC (226 in P. crassipes vs. 522 and 480 in O. sativa and another diploid grass Setaria italica , r espectiv el y), GRAS (62 vs. 65 and 59, r espectiv el y), per oxidase (162 vs. 158 and 170, r espectiv el y), and legume lectin (66 vs. 99 and 63, r espectiv el y) (Fig. 2 E) [33][34][35][36][37].We also compar ed the gene family size between water hyacinth and 2 other species of the gr ass famil y, tetr a ploid weed Ec hinoc hloa oryzicola (Fig. 2 A) and crop durum wheat ( Triticum turgidum ), and found the same trend (Fig. 2 B).For example, the number of NB-ARC genes in durum wheat (753) and E. oryzicola (318) was higher than in water hyacinth (226) ( P < 0.001, Fisher's exact test).The results suggested a contraction of disease-resistance genes in the P. crassipes genome, consistent with the phenomenon observed in the Echinochloa weeds [ 38 ].
The same bioinformatics pipeline was used to compare the patterns of gene retention and loss in the commelinids for se v er al other gene families.A higher number of P450 genes (574) was observed in P. crassipes compared with other species, possibly related to its capacity of survival in the severely polluted conditions (Fig. 2 B).In Arecales and Zingiberales, the increased number of GRAS genes implied a reduction of the gene family during div er gence of P. crassipes (Fig. 2 B).Consistent with the findings from previous studies [40][41][42], we observed a significant in- cr ease of disease-r esistance genes in cr ops, including genes encoding legume lectin, peroxidase, and NB-ARC.

Ancestral karyotype evolution of the commelinids
Being a k e y phylogenetic br anc h within the commelinids clade, the high-quality r efer ence genome of Commelinales generated in this study provides an opportunity to reconstruct the ances-tral karyotype of the commelinids.We therefore compared 7 repr esentativ e species with well-assembled genomes with P. crassipes (Fig. 3 A).By inferring intergenomic gene collinearity, we mapped the 7 genomes onto P. crassipes and estimated the ratio of the best-matched orthologous regions between P. crassipes and C. nucifera (Arecaceae), Ananas comosus (Poaceae), and Zingiber officinale (Zingiberales), being 4:2, 4:3, and 4:4, r espectiv el y, a r esult consistent with the WGD times experienced by the species ( Supplementary Fig. S7A ).Based on the gene collinearity of the 4 genomes (Fig. 3 A, Supplementary Fig. S13a -c ), we constructed an ancestral karyotype with 8 pr oto-c hr omosomes shar ed by the commelinids (Fig. 3 B).Accor dingly, w e also reconstructed the ancestral karyotypes of 4 other species, O. sativa , M. balbisiana , Brachypodium distachyon , and P. latifolius .The reconstruction results clearly sho w ed frequent chromosomal r earr angements in P. crassipes and genome structure changes in Zingiberales (Fig. 3 B).A close c hec k of shared collinearity between extant plant c hr omosomes identified the origin of certain extant c hr omosomes, ther eby r e v ealing their antiquity.For example, the region originated from τ WGD, located in chromosome 6 of C. nucifera and c hr omosome 1 of P. crassipes (Fig. 3 A).From the deduced ancestral state, Commelinales proto-chromosomes have been shaped through τ WGD follo w ed b y 1 fission and 13 fusions to r eac h an n = 4 intermediate state .T hen 3 fissions and 13 fusions accounted for the transition between the n = 4 intermediate state and the modern genome structure of 8 chromosomes in subgenomes A and B of P. crassipes .The fewest chromosomal rearr angements wer e observ ed in C. nucifera , consistent with its low nucleotide substitution rate, while Zingiberales underwent similar massive chromosomal rearrangements.

Genetic di v ersity of P. crassipes
To estimate genetic diversity of global water hyacinth, we collected an additional 9 lines from South America (Brazil), Asia (China and Malaysia), and Europe (Germany) (Fig. 4 ) and sequenced them with an av er a ge of 36 × genomic cov er a ge.Based on the single-nucleotide pol ymor phisms (SNPs) among the 9 genomes and the P. crassipes r efer ence genome (Zijingang#1), we found a r elativ el y low genetic diversity ( π = 1.44 × 10 −3 ) of the global w ater hy acinth, compar ed to sor ghum (3.05 × 10 −3 ) and other crops [ 43 ].Based on principal component analysis (PCA), the w ater hy acinth nativ e to Br azil (5 lines fr om differ ent locations) seemed to have a relatively higher diversity than those from other countries (Malaysia, China, and Germany) ( Supplementary Fig. S8 ), indicating a tendency of a more divergent genetic diversity of the species in the area of its origin [ 44 ].
The phylogenetic tree of the global water hyacinth was consistent with the PCA results (Fig. 4 ), in which the Brazil lines embraced the 3 lines from the 3 other countries.Of all 5 non-Brazilian lines, except for one of the lines from Germany (German y_Rostoc k), the other 4 lines (including the Zijingang#1 line) had almost the same nuclear genome as the 2 Brazilian lines (Brazil_Vicosa and Brazil_Bombinhas).The chloroplast genomes of all 10 lines were further assembled, and surprisingl y, onl y 2 c hlor oplast genomes (named c hlor oplast genomes A and B) of water hyacinth, which wer e nearl y identical and differ ed by onl y a 1-bp indel (Fig. 4 A), wer e ac hie v ed.In Br azil, the c hlor oplast genomes A and B were observed in lines from the southern and northern ar eas, r espectiv el y, while all water hyacinth lines from other countries had genome A. Taken together, these results support Brazil as one of the origins of water hyacinth and suggest a global spread potentially by 1 or 2 genotypes.

Discussion
At present, all of the major commelinids crops (e .g., rice , wheat, and maize) [45][46][47] and other important economic crops of the clade such as pineapple and bananas [ 45 , 48 , 49 ] have had their genomes sequenced.Ho w e v er , the Commelinales order , an im-portant phylogenetic node of the commelinids, still lacks a reference genome until now.Here we generated a high-quality reference genome of P. crassipes , representing the first genome of the Commelinales order.The availability of the genome provides a crucial missing link among different orders of the commelinids clade and is anticipated to facilitate studies of genome evolution.
The analysis of the ancient karyotype of the commelinids provides clear evidence for the clade having 8 pr oto-c hr omosomes.While the result differs from the result of 5 proto-chromosomes reported by other studies [ 50 , 51 ], it is in line with the result based on a study of coconuts [ 26 ].Appar entl y, the lac k of high-quality genomes of r epr esentativ e species of crucial nodes of a phylogenetic tree hinders the inference of an evolutionary framework [ 52 ].With the continuously increasing number of high-quality genomes, particularly the genomes filling the missing links, such as the w ater hy acinth genome generated in this study, gene collinearity and syntenic blocks between different species of the commelinids clade can be more clearly defined and characterized, shedding lights on the plasticity of the commelinids genomes and their evolutionary trajectories.
Water hyacinth seemed to have experienced a significant reduction in disease-resistance genes (such as NB-ARC , GRAS , peroxidase, and legume lectin) during its evolutionary history.This could potentially be linked to fitness costs associated with allocating energy to w ar d gro wth and r epr oduction pr ocesses [53][54][55].Emerging data demonstrate that the growth-defense tradeoffs allow plants to adjust growth and defense based on external conditions [ 56 ].The phenomenon of shrinking of diseaseresistance genes has also been observed in other noxious weeds [ 38 , 53-55 ].It is reasonable to assume, ther efor e, that the loss of the disease-resistance gene in the P. crassipes genome could be a result of natural selection to maximize and accelerate the gr owth and r epr oduction of P. crassipes .Ho w e v er, it is also possible that fewer disease-resistance genes evolved during its evolutionary history due to lo w er disease pr essur e in the surrounding envir onment (water) wher e P. crassipes gr ows.Significant contr actions in certain disease-resistance gene families imply stronger competitiv eness and inv asiv eness of P. crassipes.While strong disease resistance is an important agronomic trait for crops, rapid gr owth and extensiv e r epr oduction may be necessary for weediness and inv asiv eness in general.Further investigation of the underl ying mec hanisms, suc h as fitness costs in weeds, will thus contribute to a better understanding of their inv asiv e str ategy and could potentially be used to develop effective weed management strategies.
This study r e v ealed both identical nuclear and c hlor oplast genomes between some of the Brazilian w ater hy acinth and all the w ater hy acinth from other countries (except the German Rostock line), indicating the spread of a limited genotype of water hyacinth from South America, where it has the highest genetic diversity.The genetic uniformity has been observed in the global spread of w ater hy acinth and other inv asiv e species [ 7 , 57 ].Bombinhas is a city in the southern region of Brazil, located in close proximity to the Itajaí Port, the sixth largest port in Brazil, established in the earl y 1860s.Giv en the str ategic location of the Itajaí Port on the South American East Coast, there is a possibility that the early inv asion abr oad of w ater hy acinth could have been facilitated by tr ansportation/immigr ation fr om the Itajaí Port, whic h was not mentioned in Brazil's history.Although the Rostock line may indicate additional global dispersal of water hyacinth, our results indicated that the available non-Brazilian water hyacinth may have originated from Brazil.

Materials collection and sequencing
A wild P. crassipes plant (Zijingang#1) collected from Zijingang Campus of Zhejiang University, Hangzhou, China, was used in construction of the r efer ence genome .T he additional 9 lines of P. crassipes were collected globally for phylognetic analysis, with their detailed information available in Supplementary Table S3 .Genomic DNA of P. crassipes was extracted from young leaves using the CTAB method for sequencing library construction.Following the standard protocols of the Pacific Biosciences, DNA libraries for single-molecule real-time PacBio genome sequencing were constructed and circular consensus sequencing was performed on the PacBio Sequel2 platform ( RRID:SCR _ 017990 ) for highfidelity (HiFi) r eads.Short-r ead libr aries of P. crassipes wer e constructed according to Illumina's standard protocol, and pairedend reads (2 × 150 bp) were sequenced on an Illumina HiSeq X Ten platform ( RRID:SCR _ 020131 ).With default par ameters, r aw P acBio subr eads wer e filter ed and corr ected using the pbccs pipeline.
A Hi-C library was constructed using fresh young leaves of P.
crassipes , whic h wer e fixed in 1% formaldehyde for cr osslinking.
Cells were lysed using a Dounce homogenizer and digested using the Hin d III restriction enzyme .T he DNA ends were filled and labeled with biotin and the filled-in Hin d III sites were ligated to form Nhe I sites.Complexes with the biotin-labeled ligation products were purified and sheared, and the biotinylated Hi-C liga-tion pr oducts wer e pulled down and used to construct Illumina sequencing libraries [ 58 ].

Genome assembly
The HiFi r eads wer e subjected to hifiasm ( RRID:SCR _ 021069 ) [ 59 ] for de novo assembly in default mode.After mapping the long subreads to the initial assembly with minimap2 ( RRID:SCR _ 018550 ) [ 60 ], racon [ 61 ] was used in 3 rounds of correction with default parameters.Based on the subassembly, clean Hi-C reads were analyzed and 3D-DNA [ 62 ] was used to scaffold contigs into pseudoc hr omosomes follo w ed b y being manuall y corr ected with Juicer ( RRID:SCR _ 017226 ) [ 63 ].
T he abo v e genome assembl y was subjected to SubPhaser [ 64 ] to search the subgenome-specific sequence ( k -mer), and then homoeologous c hr omosomes wer e assigned into 2 subgenomes ( Supplementary Fig. S2 ).Based on the cov er a ge depth of the short r eads a gainst the assembl y, we manuall y corr ected some errors with discrete chromatin interaction patterns .T he assembled genome was subjected to BUSCO v5.5.0 ( RRID:SCR _ 015008 ) [ 23 ] with embryophyta_odb 10 to e v aluate the completeness of the genome.
Tandem repeats were identified with Satellite Repeat Finder [ 70 ], and 1 type of centr omer e sequences was found.To precisely annotate the location of the centromeric monomers CEN148 , we calculated peak values in the windows of the divided genome and merged the windows with the same kind of monomers.

Genome pol yploidiza tion anal ysis
We selected 4 r epr esentativ e species, including M. balbisiana , C. simplicifolius , P. latifolius , and A. tatarinowii , for compar ativ e genomics analysis with E. crassipes , aiming to investigate the polyploidization e v ent(s) that occurr ed and whether they were shared or not, as well as infer the evolutionary trajectories that led to the formation of current chromosomes.We first aligned protein sequences manually among species or subgenomes.WGDI (Whole-Genome Duplication Integr ated anal ysis) was used to identify collinear bloc ks, whic h ar e the genomic r egions containing collinear genes according to the combined information of gene similarity and gene order, within and between each genome [ 81 ].The maximum gap allo w ed betw een collinear genes on a c hr omosome was set to 50 intervening or noncollinear genes.To help date e volutionary e v ents and identify collinear genes pr oduced by differ ent e v ents, pol yploidization, or speciation, K S between collinear genes was estimated using KaKs_calculator with the NG model [ 82 ].Given the possible effects of diverse nucleotide substitution among differ ent linea ges for phylogen y estimation, shar ed pol yploidization between water hyacinth and coconut was recognized as an anchor to date duplication events that occurred in water hyacinth.

Analysis of ancestral karyotypes and chromosome evolutionary trajectories
To investigate the chromosome evolution of commelinid genomes, we selected r epr esentativ e species (Fig. 3 B) fr om 4 orders with c hr omosome-le v el genome assembl y.We identified homologous proteins between extant genomes and the reconstructed commelinid karyotypes and then used WGDI to detect syntenic blocks as described above [ 81 ].Then, dot plots were created to show synteny, and the chromosomal rearrangements wer e r econstructed.

Resequencing and variant calls
For r esequencing, short-r ead libr aries of additional lines of P. crassipes collected globally were sequenced on an Illumina HiSeq X Ten platform ( Supplementary Table S3 ).Raw data were first filtered by the NGSQC Toolkit (v2.3.348)[ 86 ].Clean paired-end reads of each accession were then aligned to P. crassipes using Bowtie2 with default parameters.A custom pipeline [ 87 ] was used in calling and filtering variants.Low-quality variants were further r emov ed with minor allele fr equency < 0.01 and missing r ate > 30%.

Phylogenetic analysis and PCA
The phylogenetic analysis was performed on the full set of all 9 w ater hy acinth lines.A phylogen y tr ee was constructed based on 6.5 million SNPs using FastTree ( RRID:SCR _ 015501 ) [ 75 ] and visualized in iTOL ( RRID:SCR _ 018174 ) [ 76 ].All SNPs with lines were analyzed using the R package SNPRelate to conduct PCA [ 88 ].

Chloroplast genome assembly and annotation
The clean data of Illumina sequencing reads of all 9 lines were applied in de novo assembly by GetOrganelle ( RRID:SCR _ 022963 ) [ 89 ].Genome annotation was performed by the GeSeq ( RRID: SCR _ 017336 ) online [ 90 ].A custom script was used to filter out duplicate annotated genes.Multiple sequence alignment of c hlor oplast genomes was performed with MAFFT (Multiple Alignment based on Fast Fourier Transform) ( RRID:SCR _ 011811 ) [ 91 ].

Additional Files
Supplementary Fig. S1.Genome size estimated based on k -mer analysis and flow cytometry analysis.A, k -mer analysis result with k = 17.B,C,D, three duplication of flow cytometry analysis to esti- vs M. balbisiana P. crassipes vs P. latifolius P. crassipes vs A. tatarinowii

Figure 1 :
Figure 1: Phylogeny and evolution of the P. crassipes genome.(A) Single-copy gene-based ultrametric phylogenetic tree and divergence times of P. crassipes and other r epr esentativ e species of the commelinids with A. tatarinowii as an outgroup.(B) Distribution of synonymous substitution per site ( K S ) of paralog genes in collinear regions of P. crassipes and orthologous genes between P. crassipes and other members of the commelinids ( C. nucifera , C. simplicifolius , Z. officinale , M. balbisiana , P. latifolius , and A. tatarinowii ).(C) Dot plots showing the conserv ed genomic synten y between P. crassipes and C. nucifera .An example of conserved synteny region originating from the τ WGD event is marked with a rectangle.

Figure 2 :
Figure 2: Changes of gene family size during genome polyploidization of P. crassipes .(A) Dot matrix plot and distribution of fold changes of gene family sizes in P. crassipes compared with diploid O. sativa and tetraploid E. oryzicola .Regarding the distribution of gene family sizes (subfigures at lo w er right corner), the highest percentage was observed in the gene families of P. crassipes that were 2 times bigger in size than those of O. sativa (left) and the same size as those of E. oryzicola (right).(B) Comparison of disease r esistance-r elated gene family sizes between P. crassipes and other commelinids species.+ and − indicate increase and decrease in size, respectively, relative to P. crassipe s. * P < 0.01, * * P < 0.001, * * * P < 0.0001, Fisher's exact test.(C-E)Synteny retention ratio of different paralogous pairs in 10 gene families after polyploidization of P. crassipes.The graphs show the percentage of retained gene pairs that experienced the 2 polyploidization events (C, A1:A2:B1:B2 = 1:1:1:1), 1 of the 2 e v ents (D, A1:A2 or B1:B2 = 1:1), or 2 subgenomes (E, subA:subB = 1:1).The dashed lines r epr esent the av er a ge r etention r atio of genes acr oss the genome.LLD: legume lectin domain.

Figure 3 :
Figure 3: Inference of proto-chromosomes and ancestral karyotypes of the commelinids.(A) Identification of proto-chromosomes based on synteny regions among extant chromosomes.Alignments between proto-and extant chromosomes shown in different colors indicate the different origination from the proto-chromosomes.Ac: A. comosus ; Cn: C. nucifera ; Pc: P. crassipes .(B) Reconstruction of ancestral karyotypes and their phylogeny of the commelinids.Ancestr al c hr omosomes at specific e v olutionary nodes w er e inferr ed and denoted with differ ent colors.Whole-genome duplication and triplication e v ents ar e shown in r ed and blue circles, r espectiv el y.

Figure 4 :
Figure 4: Genetic diversity and phylogeny of global P. crassipes.(A) The collection locations of the 10 water hyacinth lines used in this study are indicated by circles and the c hlor oplast genomes (A and B) are labeled with two different colors.(B) A phylogenetic tree of the 9 lines built based on their nuclear genomic SNPs r elativ e to the r efer ence Zijingang genome.

Table 1 :
Summary of P. crassipes plant materials collection, genome sequencing, and annotation by this study